The Beijing ZhiYuan Artificial Intelligence Research Institute, in collaboration with Shanghai Jiao Tong University, Renmin University of China, Peking University, and Beijing University of Posts and Telecommunications, has launched an ultra-long video understanding model named Video-XL. This model is an important demonstration of the core capabilities of multimodal large models and a key step towards General Artificial Intelligence (AGI). Compared to existing multimodal large models, Video-XL shows superior performance and efficiency in processing long videos exceeding 10 minutes.